About

Analytical objectives:

The 2016 election was one of the most polarizing in our Nation’s history. There is a multitude of data available to help us understand the landscape of the 2016 election and why voters made the decision to back Donald Trump. The first section “A Polarized Nation”, highlights demographic and economic variables that distinguish Democratic and Republican counties. While Republican counties tend to have lower income than Democratic ones, they also have lower levels of poverty. White proportion of the population is a strong indicator of county outcome. The next section “News of the 2016 Presidential Election” examines the textual output of liberal leaning The New York Times and the conservative leaning Wall Street Journal (WSJ) during June to November 2016 using sentiment analysis, TF-IDF scoring, and bigram analysis. There are distinct differences between the publications. The third section “2016 Split-Ticket Voting” shows choropleths of all levels of elections that occurred in 2016, as well as the vote difference between Presidential and all other elections. Western Wisconsin supports Democratic House candidates while also leaning Republican in the Presidential race. Local elections in the south that had data were consistently more Republican than those counties’ Presidential votes. And in the final section “County Factors in 2016 Outcome”, association rules mining is used as a correlative method to determine demographics that commonly lead to Democratic or Republican winnerse in counties. This brings the narrative full circle as many of the demographics considered in the first section are strongly associated with polarized outcomes.

Data sources and software:

Election data

Zip files were downloaded from the following sites. The data files were already in a neat csv format.

Data processing included the following:

  • aggregating vote totals to county-level
  • Creating Democratic/Republican vote percentages

ACS data

Data was downloaded from IPUMS using their interactive data puller. The time period for the data is 2005-2016 as those are the years that provide county FIPS codes. The following variables were used:

Data was aggregated up to the county-level using weighted statistics according to the person weight variable.

ACS data from IPUMS USA, University of Minnesota, www.ipums.org

News data

Data was downloaded from Factiva in 100 article chunks. The search parameters were as follows:

  • text search: election AND (trump OR clinton)
  • date range: 06-01-2016 to 11-08-2016
  • Source: The New York Times OR The Wall Street Journal

3,013 results were found, and the raw data was downloaded in rtf format and converted to raw text using the striprtf package in Python. This data is then cleaned, tokenized, stemmed, and stop words removed using nltk. Sentiment is calculated using nltk VADER sentiment. TF-IDF analysis is performed using the nltk package. Tensorflow is used to perform bi-directional LSTM neural network analysis to predict news publication based on cleaned tokenized text.

Software versions:

R packages
  • tidyverse==1.3.0
  • sf=0.9.6
  • reticulate==1.16
  • rmarkdown==2.4.6
  • flexdashboard==0.5.2
  • ggplot2==3.3.2
  • pacman==0.5.1
Python packages
  • python==3.8.5
  • striprtf==0.0.12
  • pandas==1.0.5
  • numpy==1.18.5
  • dateutil==2.8.1
  • seaborn==0.11.0
  • matplotlib==3.2.2
  • nltk==3.5
  • plotly==4.10.0
  • re==2.2.1
  • sklearn==0.23.1
  • tensorflow==2.3.1

A Polarized Nation

Columns

Poverty rates

Racial and income differences

Column {data-width=600}

Hispanic population

Veteran distribution

Medicare vote

News of the 2016 Presidential Election

Column

Negative Sentiment Timeline — if graph is cut off, zoom out in browser

Positive Sentiment Timeline — if graph is cut off, zoom out in browser

Column

TFIDF Analysis

Bigram Analysis

2016 Split-Ticket Voting

Column

Presidential Election

U.S. Senate Election

U.S. House Election

State-level Election

Local-level Election

Column

President vs U.S. House

President vs U.S. Senate

President vs State

President vs Local

County Factors in 2016 Outcome

Column

Introduction

Association rules are a method for showing IF-THEN correlations. This network graphs shows ACS demographic variables and rules that are correlated with U.S. counties picking the Democratic or Republican candidate in the 2016 election. Shading of the rules indicates the degree of dependence among the variables for that rule. Hover your mouse over the rules for statistics about the rule. Hover over any node to see its direct connections.

For background on the metrics, read here

Column

Interactive decision rules output